[fix](be) Avoid unsigned underflow in JSON modify path#63579
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
|
/review |
ghost
left a comment
There was a problem hiding this comment.
Review result: no blocking issues found.
Critical checkpoint conclusions:
- Goal/test: The PR removes the unsigned subtraction in the JSON modify parent-path loop. Existing json_insert/json_set/json_replace regression cases cover root paths, missing paths, array paths, NULL path handling, and wildcard errors; the author also reports diff check and object compilation.
- Scope/clarity: The modification is small, focused, and preserves existing behavior by caching the leg count and using j + 1 < legs_count.
- Concurrency/lifecycle/config/compatibility: Not applicable; this is a local expression function loop change with no shared state, lifecycle changes, config, persistence, or protocol/storage format changes.
- Parallel paths: The changed loop is the relevant json_modify implementation shared by insert/set/replace. Other get_leg_vector_size() - 1 occurrences are in different recursive json path handling and were not the same parent-path construction case.
- Error handling/invariants: The empty-path insert-parent branch remains guarded by the existing findValue root-path behavior plus DCHECK_GT. No ignored Status was introduced.
- Memory/performance: No new allocation or memory ownership pattern was introduced; caching legs_count avoids repeated calls in the modified block.
- Observability/docs: Not needed for this local correctness cleanup.
- User focus: No additional user-provided review focus was supplied.
|
run buildall |
TPC-H: Total hot run time: 31812 ms |
TPC-DS: Total hot run time: 172265 ms |
### What problem does this PR solve?
Issue Number: None
Problem Summary: JSON modify functions build a parent path by iterating to the element before the last JSON path leg. The loop used `get_leg_vector_size() - 1` on an unsigned size, which can underflow for an empty path expression even though normal root-path execution resolves to the root value before that branch. Cache the leg count once, assert the non-empty invariant for the insert-parent branch, and use `j + 1 < legs_count` to avoid unsigned subtraction while preserving behavior.
### Release note
None
### Check List (For Author)
- Test:
- Manual test: `git diff --check`
- Manual test: `DORIS_HOME=/mnt/disk7/hushenggang/doris-fix-spill ninja -C be/ut_build_ASAN -j 1 src/exprs/CMakeFiles/Exprs.dir/function/function_jsonb.cpp.o`
- Behavior changed: No
- Does this need documentation: No
0a1bc22 to
6f9182f
Compare
|
/review |
1 similar comment
|
/review |
ghost
left a comment
There was a problem hiding this comment.
Reviewed head SHA 6f9182f. No blocking issues found.
Critical checkpoint conclusions:
- Goal/test: The change avoids unsigned underflow in JSON modify parent-path construction while preserving existing behavior. Existing json_set/json_insert/json_replace regression suites already include root-path and missing-path cases that cover the relevant control flow; this PR also reports diff/compile checks.
- Scope: The modification is small and focused, caching the leg count and replacing the unsigned subtraction loop bound.
- Concurrency/lifecycle: No shared state, threading, or non-obvious lifecycle changes introduced.
- Configuration/compatibility: No config, protocol, storage-format, or compatibility changes.
- Parallel paths: The template implementation is shared by jsonb_set/jsonb_insert/jsonb_replace aliases, so the fix applies consistently to the affected modify paths.
- Conditions/error handling: The added DCHECK documents the non-empty-path invariant for the insert-parent branch; Status propagation remains unchanged.
- Testing/observability: No new observability needed for this local arithmetic-safety fix. Regression tests exist for the surrounding JSON modify functions; I did not run the full regression suite in this runner.
- User focus: No additional user-provided review focus.
ghost
left a comment
There was a problem hiding this comment.
No blocking issues found in this review.
Critical checkpoint conclusions:
- Goal/test: The PR avoids the unsigned
size_tunderflow when building the JSON modify parent path and pins macOS BE UT to JDK 17. The changed JSONB flow preserves root-path behavior because$resolves to the root before the missing-path insert branch; existing JSON modify regression coverage includes root-path cases. The PR reportsgit diff --checkand a targeted BE compile. - Scope/clarity: The change is small and focused: one loop condition rewrite plus one workflow setup step.
- Concurrency/lifecycle: No new shared state, locks, threads, static initialization, or lifecycle-sensitive objects are introduced.
- Configuration/compatibility: No Doris runtime config, storage format, FE-BE protocol, or rolling-upgrade compatibility change is involved.
- Parallel paths: The JSON insert/set/replace implementation is templated through the same modified path, so the fix applies to all three modify variants.
- Conditions/error handling: The new non-subtractive loop avoids the underflow path;
DCHECK_GT(legs_count, 0)is consistent with the insert-parent invariant and root paths do not enter that branch for set/replace. Status-returning calls remain checked. - Test coverage/results: No new regression output is added, but the existing JSON modify suites cover root and invalid path behavior; I also verified the PR diff with
git diff --check. - Observability/performance: No new observability is needed for this local arithmetic fix; the cached leg count removes repeated size lookup and does not add meaningful overhead.
- Data correctness/transactions/persistence: The PR does not affect storage visibility, transaction processing, persistence, delete bitmaps, or data write atomicity.
User focus response: No additional user-provided review focus was present.
7d2c8c0 to
17bc2ed
Compare
|
/review |
17bc2ed to
a3cbe49
Compare
|
/review |
ghost
left a comment
There was a problem hiding this comment.
Review result: no blocking issues found. The JSONB change is a focused unsigned-underflow avoidance that preserves the existing root-path behavior because the insert-parent branch is only reached when the target path is not found; existing tests already cover root-path json_insert/json_set/json_replace cases. The macOS workflow change explicitly pins JDK 17 via setup-java and uses Homebrew LLVM before running BE UT, which matches the stated runner failures.\n\nCritical checkpoint conclusions:\n- Goal/test: the PR addresses JSON modify unsigned-size underflow risk and macOS BE UT environment setup; existing JSON modify regression cases cover relevant root and path behavior, and the author reports diff check plus targeted compilation.\n- Scope: changes are small and focused.\n- Concurrency/lifecycle: no new concurrent state, threads, locks, or special object lifecycle introduced.\n- Configuration/compatibility/protocol: no new Doris config, storage format, RPC, or FE-BE protocol compatibility concern.\n- Parallel paths: json_insert/json_set/json_replace share the templated path, so the BE fix applies consistently; workflow change is specific to macOS BE UT.\n- Error handling/memory: no ignored Status introduced; the existing DorisUniqueBufferPtr tracking remains unchanged.\n- Tests/results: no generated result files changed; no additional user focus points were provided.\n- Observability/performance: no new runtime observability need; the cached leg count avoids repeated lookup without changing hot-path semantics.
|
run buildall |
ghost
left a comment
There was a problem hiding this comment.
I found one blocking issue in the macOS BE UT workflow change. The JSON path arithmetic change is small and matches the stated goal; I did not find an additional correctness issue in that code path.
Critical checkpoints:
- Goal/test proof: The JSON underflow guard is addressed, but the workflow goal is not achieved because the PR's own
BE UT (macOS)check still fails. - Scope/focus: The code changes are small and focused.
- Concurrency/lifecycle/config/compatibility/FE-BE protocol: No relevant new concurrency, lifecycle, dynamic config, compatibility, or protocol concerns found.
- Parallel paths: The workflow now needs consistent runner/toolchain/thirdparty architecture handling; the current path remains broken.
- Tests/results: Existing CI result shows the changed macOS workflow fails; author manual tests do not cover this workflow end-to-end.
- Observability/transaction/data writes: Not applicable.
- Performance: No relevant performance concern found.
User focus: No additional user-provided review focus was specified.
| tar -xvf doris-thirdparty-prebuilt-darwin-x86_64.tar.xz | ||
| popd | ||
|
|
||
| export JAVA_HOME="${JAVA_HOME_17_X64%\/}" |
There was a problem hiding this comment.
This workflow still fails on the PR check after this change. The log for BE UT (macOS) shows the macos-15 runner is arm64 (/opt/homebrew, ARM_MARCH is armv8-a+crc, Temurin path under arm64), and these new exports make CMake configure a native arm64 clang build. However the job still downloads and extracts doris-thirdparty-prebuilt-darwin-x86_64.tar.xz, and the build then stops with ninja: error: .../thirdparty/installed/lib64/libarrow_dataset.a, needed by 'test/doris_be_test', missing and no known rule to make it. So the workflow fix is incomplete: please either run this job on an Intel macOS label / x86_64 toolchain, or switch the downloaded third-party archive and toolchain setup to the matching arm64 package so ./run-be-ut.sh can actually build.
### What problem does this PR solve?
Issue Number: None
Problem Summary: The macOS BE UT workflow relies on `JAVA_HOME_17_X64` before running `run-be-ut.sh`. On the current macOS runner the job resolves Java 25 instead, and the BE UT script rejects it because the build requires JDK 17. After switching Java, the job also needs the Homebrew LLVM toolchain to make OpenMP detection succeed, and arm64 runners must download the matching darwin-arm64 third-party archive instead of the x86_64 archive. Set up Temurin JDK 17 explicitly, point Doris to Homebrew LLVM, and select the Darwin third-party archive by runner architecture.
### Release note
None
### Check List (For Author)
- Test:
- Manual test: `git diff --check`
- Manual test: `DORIS_HOME=/mnt/disk7/hushenggang/doris-fix-spill ninja -C be/ut_build_ASAN -j 1 src/exprs/CMakeFiles/Exprs.dir/function/function_jsonb.cpp.o`
- Behavior changed: No
- Does this need documentation: No
a3cbe49 to
de24ddc
Compare
|
/review |
ghost
left a comment
There was a problem hiding this comment.
Review result: no new blocking issues found in the current GitHub PR diff.
Critical checkpoint conclusions:
- Goal and proof: the PR targets unsigned underflow avoidance in
jsonb_modifypath handling and macOS BE UT runner setup. The JSONB change is a minimal equivalent loop rewrite, and the workflow now selects Java 17, Homebrew LLVM, and the Darwin third-party archive by runner architecture. Evidence is limited to the author-reported compile/diff checks; no end-to-end BE UT result is included. - Scope: changes are small and focused.
- Concurrency: no new concurrent code or shared mutable lifecycle was introduced.
- Lifecycle/static initialization: no new static/global initialization dependency was introduced.
- Configuration: no Doris runtime config item was added.
- Compatibility/storage format: no protocol, persisted metadata, or storage-format compatibility change was introduced.
- Parallel paths: no additional JSON modify registration path appears to need the same local loop change beyond this shared implementation.
- Conditional checks: the new
DCHECK_GT(legs_count, 0)is on the insert-parent branch;findValue()returns the root for empty paths, so that branch should only execute for non-empty paths. - Tests: test coverage is weak for the workflow fix because the PR does not show a successful macOS BE UT run, but I did not find a distinct code issue beyond the existing review thread context.
- Observability: not applicable; no runtime behavior requiring new logs/metrics.
- Transaction/persistence/data writes: not applicable.
- FE-BE variable passing: not applicable.
- Performance: no meaningful regression found; the JSONB loop now caches the leg count once.
Existing review context: I read the existing inline thread about the macOS job previously downloading the x86_64 third-party archive on an arm64 runner. The current PR diff now selects doris-thirdparty-prebuilt-darwin-${arch}.tar.xz, so I did not duplicate that already-known concern.
User focus: no additional user-provided review focus was specified.
|
run buildall |
TPC-H: Total hot run time: 31863 ms |
TPC-DS: Total hot run time: 173582 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: None
Problem Summary: JSON modify functions build a parent path by iterating to the element before the last JSON path leg. The loop used
get_leg_vector_size() - 1on an unsigned size, which can underflow for an empty path expression even though normal root-path execution resolves to the root value before that branch. This patch caches the leg count once, asserts the non-empty invariant for the insert-parent branch, and usesj + 1 < legs_countto avoid unsigned subtraction while preserving behavior.The macOS BE UT workflow also needed runner setup updates: the current runner resolved Java 25 while the BE UT script requires JDK 17, OpenMP detection needs the Homebrew LLVM toolchain, and arm64 runners must download the matching Darwin third-party archive. This patch explicitly sets up Temurin JDK 17, points Doris to Homebrew LLVM, and selects the Darwin third-party archive by runner architecture before running BE UT.
Release note
None
Check List (For Author)
git diff --checkDORIS_HOME=/mnt/disk7/hushenggang/doris-fix-spill ninja -C be/ut_build_ASAN -j 1 src/exprs/CMakeFiles/Exprs.dir/function/function_jsonb.cpp.o